A  Unified  Approach  to 
Global  Convergence  of  Trust- Region 
Methods  for  Nonsmooth  Optimization 

J.E.  Dennis 
Shou-Bai  Li 
and 

R.A.  Tapia 

September  1989 
(revised  July  1993) 


TRS9-5 

(to  appear  in  Mathematical  Programming) 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1989  to  00-00-1989 

4.  TITLE  AND  SUBTITLE 

A  Unified  Approach  to  Global  Convergence  of  Trust  Region  Methods  for 
Nonsmooth  Optimization 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computational  and  Applied  Mathematics  Department  ,Rice 

University, 6100  Main  Street  MS  134, Houston, TX, 77005-1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

41 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


A  Unified  Approach  to  Global 
Convergence  of  Trust  Region  Methods  for 
Nonsmooth  Optimization* 


John  E.  Dennis  Jr.,  Shou-Bai  B.  Li  and  Richard  A.  Tapia* 


Key  Words:  nonsmooth  optimization,  trust  region  methods, 

global  convergence 

Abbreviated  Title:  Global  Convergence  of  Trust  Region 

Methods 


Abstract 

This  paper  investigates  the  global  convergence  of  trust  region  (TR) 
methods  for  solving  nonsmooth  minimization  problems.  For  a  class 
of  nonsmooth  objective  functions  called  regular  functions,  conditions 
are  found  on  the  TR  local  models  that  imply  three  fundamental  con¬ 
vergence  properties.  These  conditions  are  shown  to  be  satisfied  by 
appropriate  forms  of  Fletcher’s  TR  method  for  solving  constrained 
optimization  problems,  Powell  and  Yuan’s  TR  method  for  solving 
nonlinear  fitting  problems,  Zhang,  Kim  and  Lasdon’s  successive  linear 
programming  method  for  solving  constrained  problems,  Duff,  Nocedal 
and  Reid’s  TR  method  for  solving  systems  of  nonlinear  equations,  and 
El  Hallabi  and  Tapia’s  TR  method  for  solving  systems  of  nonlinear 
equations.  Thus  our  results  can  be  viewed  as  a  unified  convergence 
theory  for  TR  methods  for  nonsmooth  problems. 
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1  Introduction 


Trust  region  methods  (TR)  are  an  important  class  of  iterative  methods  for 
solving  nonlinear  optimization  problems.  In  an  unconstrained  minimization 
problem,  a  step  to  a  new  iterate  is  obtained  by  minimizing  a  local  model  of  the 
objective  function  over  a  restricted  region  centered  about  the  current  iterate. 
The  size  of  this  restricted  region  depends  on  how  well  the  local  model  predicts 
the  behavior  of  the  objective  function.  This  strategy  will  force  convergence 
of  the  iterates  from  an  arbitrary  starting  point  to  a  point  which  satisfies  the 
first-order  necessary  conditions.  Motivation  and  a  survey  of  TR  methods  for 
the  smooth  case  can  be  found  in  More  (1983),  see  also  Chapter  6  of  Dennis 
and  Schnabel  (1983)  for  unconstrained  problems.  In  the  past  decade,  many 
trust  region  methods  for  minimizing  a  nonsmooth  objective  function  have 
been  proposed  and  applied  to  the  nonlinear  equations  problem,  the  nonlinear 
fitting  problem,  and  the  constrained  optimization  problem. 

In  this  paper  we  consider  trust  region  methods  for  the  unconstrained 
minimization  of  a  nonsmooth  function,  i.e., 

min  f(x)  (1) 

where  f  :  Rn  —*  R  is  a  nonsmooth  function  and  may  represent  the  objective 
function  of  a  nonsmooth  unconstrained  optimization  problem  or  a  nonsmooth 
penalty  function  for  a  constrained  optimization  problem,  such  as  the  / <*,  norm 
or  l\  norm  penalty  function. 

We  begin  the  TR  iteration  from  a  starting  point  x0  which  may  not  be 
close  to  a  solution  of  (1).  Let  L0  =  {x  |  f(x)  <  f{x0)}  be  the  level  set  of 
/  at  xo-  We  build  at  each  iteration  a  local  model  m(xk,Pk)(s )  which  is  an 
approximation  of  f(xk  +  5)  for  small  s,  and  pk  €  Rl  is  an  /-dimensional  pa¬ 
rameter  vector  which  may  change  from  iteration  to  iteration.  For  example,  pk 
might  specify  model  curvature.  Next  we  approximately  solve  the  subproblem 

SUB(xk,Pk‘,Sk): 


min  m(xk,pk)(s) 
s.t.  Ml  <  6k, 
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to  obtain  a  trial  step  sk  that  satisfies 


ll'SArll  <  4 

and 


(2) 


m(xk,pk)(0)  -  m(xk,pk)(sk)  >  r[m(a;fc,pfc)(0)  -  m(arfc,pfc)(4)]  ^  0  (3) 

where  4  is  a  solution  of  SUB(xk,Pk',f>k),  i-e., 


4  €  argmin{m(xk,pk)(s)  |  ||s||  <  4}, 


and  0  <  r  <  1  is  a  fixed  constant.  The  positive  number  4  is  called  the  TR 
radius.  We  accept  the  step  sk  and  set  xk+\  =  xk  -f  sk  if 


rk 


aredk 

predk 


>  Cq 


where  cq  is  a  fixed  constant  in  (0, 1),  and 


aredk  =  /(^fc)  —  f(xk  +  $k)  ( actual  reduction ) 

predk  =  /(:Tfc)  —  m(xk,Pk){sk)  ( predicted  reduction). 


Otherwise  we  repeat  this  process  using  a  smaller  4  in  SUB(xk,Pk',  4)-  This 
leads  us  to  the  following  basic  TR  algorithm: 


at  the  k- th  iteration, 

STEP  1  :  approximately  solve  the  subproblem  SUB(xk,Pk]dk) 
to  obtain  sk  satisfying  (2)  and  (3); 

STEP  2  :  compute  rk  according  to  (4); 

STEP  3  :  if  rjt  <  co,  then  set  xk+ 1  =  xk,  pk+ i  =  Pk ,  reduce  4 
by  p04  and  go  to  STEP  1. 

otherwise,  set  xk+i  =  xk  +  sk,  update  the  TR  radius  4  to 
4+i,  update  pk  to  pk+i  and  go  to  STEP  1: 


where  1  >  Co  >  0  and  1  >  p0  >  0  are  fixed  constants. 
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It  is  obvious  that  this  is  a  conceptual  TR  algorithm  since  we  have  omit¬ 
ted  details  needed  to  specify  a  complete  procedure,  for  example,  a  stopping 
criterion,  an  updating  rule  for  6k  and  a  numerical  method  for  determining  s k 
in  STEP  1. 

An  approximate  solution  Sk  of  the  TR  subproblem  SU B(xk,Pk',  6k)  is 
required  to  satisfy  criterion  (3).  This  implies  that  the  step  sk  attains  at 
least  a  fixed  fraction  r  of  the  optimal  decrease  that  can  be  obtained  from 
the  TR  subproblem.  The  exact  solution  s*k  of  the  TR  subproblem  appears 
in  (3)  only  for  comparative  purposes.  We  expect  that  in  general  it  will  not 
be  necessary  to  compute  the  exact  solution  in  STEP  1.  In  the  smooth  case, 
Byrd,  Schnabel  and  Shultz  (1988)  prove  under  a  mild  assumption  that  the 
widely  used  sufficient  decrease  criterion  for  an  approximate  solution  of  the 
TR  subproblem  implies  (3).  If  the  TR  subproblem  can  be  transformed  into  a 
linear  programming  problem,  criterion  (3)  may  be  checked  using  information 
from  the  dual  problem.  There  is  an  advantage  of  using  criterion  (3)  in  the 
nonsmooth  case  because  it  does  not  require  gradient  information. 

This  paper  focuses  on  a  unified  approach  to  the  global  convergence  of  our 
basic  TR  algorithm.  We  will  attempt  to  identify  some  general  assumptions 
on  the  objective  function  and  the  local  model  that  will  allow  us  to  estab¬ 
lish  the  following  three  fundamental  convergence  properties  of  the  basic  TR 
algorithm: 

1.  An  iterate  Xk  is  a  stationary  point  of  /  in  (1)  if  for  8k  >  0,  the  step 
Sk  =  0  is  obtained  in  STEP  1. 

2.  Reducing  6k  in  STEP  3  eventually  guarantees  rk  >  cq  where  1  >  Co  >  0. 
Equivalently,  if  the  basic  TR  algorithm  loops  infinitely  often  between 
STEP  1  and  STEP  3  with  Xk+\  =  Xk,  then  the  current  iterate  Xk  must 
be  a  stationary  point  of  /  in  (1). 

3.  Any  accumulation  point  of  {x*}  generated  by  the  basic  TR  algorithm 
is  a  stationary  point  of  /  in  (1). 

In  this  way  we  will  have  obtained  a  general  convergence  theory  for  TR  meth¬ 
ods.  We  will  then  show  that  we  have  a  useful  theory  by  demonstrating  that 
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our  theory  can  handle  various  TR  methods  which  appear  in  the  literature. 
The  first  convergence  property  is  considered  in  §2.  In  §2  we  also  introduce 
some  notation  and  terminology  for  nonsmooth  functions,  and  the  assump¬ 
tions  that  will  lead  us  to  a  unified  global  convergence  theory.  A  brief  survey 
of  several  nonsmooth  TR  methods  is  contained  in  §3.  We  show  that  these  TR 
methods  satisfy  the  assumptions  introduced  in  §2.  The  second  and  the  third 
convergence  properties  are  considered  in  §4.  In  §5,  we  make  some  concluding 
remarks. 


2  Assumptions 


The  convergence  analysis  presented  in  this  paper  is  based  on  some  reasonable 
assumptions  on  the  objective  functions  and  the  local  models  employed  in 
these  TR  methods.  We  introduce  notation  in  §2.1,  state  our  assumptions  in 
§2.2  and  derive  several  properties  which  are  related  to  these  assumptions  in 
§2.3  and  §2.4. 


2.1  Notation  and  Terminology  for  Nonsmooth  Func¬ 
tions 

For  our  applications,  it  is  reasonable  to  assume  that  the  objective  function 
f(x)  and  the  local  model  m(x,p)(s)  are  always  finite,  or  f(x)  <  oo  for  at 
least  one  point  x  in  the  level  set  L0  and  m(a:,p)(s)  <  oo  at  s  =  0.  The  bulk 
of  the  material  listed  below  comes  from  Clarke  (1983)  Chapter  2. 


Definition  2.1  A  function  f  :  Rn  — >  R  is  said  to  be  (locally)  Lipschitz  near 
x  if  for  some  constants  I\  >  0  and  e  >  0,  f  satisfies  the  Lipschitz  condition 

I  f(x i)  -  f(x2)  |<  I< ||xi  -  x2\\ 

for  all  £1  and  x2  in  the  e-neighborhood  N(x)  =  {y\\\y  —  x||  <  e}  of  x,  where 
||  ■  ||  is  a  given  norm  on  Rn .  The  function  f  is  said  to  be  (locally)  Lipschitz 
on  a  set  U  if  it  is  (locally)  Lipschitz  near  every  point  x  G  U . 
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Definition  2.2 

tion  d  £  Rn  is 


The  generalized  directional  derivative  of  f  at  x  in  the  direc- 


f°(x\  d)  =  limsup 

y—*x,  <10 


fid  +  td)  -  f(y) 
t 


Lemma  2.3  Let  f  be  Lipschitz  with  constant  K  near  x.  Then  for  every 
d  €  Rn ,  f°(x-,d)  exists  and 


f°(x;d)\<K\\d\\. 


Definition  2.4  The  generalized  gradient  of  f  at  x  is  the  set 
df(x)  =  {geRn  |  gTd  <  f°(x;  d),  VdeRn}. 


Definition  2.5  A  point  x  is  said  to  be  a  stationary  point  of  f  if  0  6  df(x). 


Theorem  2.6  (first-order  necessary  condition)  Let  f  be  Lipschitz  near 
x.  If  f  attains  a  local  minimum  at  x,  then  0  6  df(x),  i.e.,  x  is  a  stationary 
point  of  f. 


Definition  2.7  A  function  f  that  is  Lipschitz  near  x  is  said  to  be  regular  at 
x  if  the  one-sided  directional  derivative 


f'{x ;  d)  =  lim 


f(x  +  td)  -  f(x) 
t 


exists  for  all  directions  d  €  Rn  and 


/'(z;d)  =  /#(z;  d). 

The  function  f  is  said  to  be  regular  on  a  set  U  if  it  is  regular  at  every  point 
of  the  set  U . 


Since  the  nonsmooth  functions  discussed  in  this  paper  are  always  assumed 
to  be  locally  Lipschitz,  we  defined  regularity  in  Definition  2.7  only  for  locally 
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Lipschitz  functions.  This  is  slightly  different  than  the  definition  in  Clarke’s 
book.  Convex  functions  defined  on  open  convex  sets  are  locally  Lipschitz  as 
is  demonstrated  by  the  following  theorem.  Thus,  by  Lemma  2.3,  a  convex 
function  defined  on  an  open  convex  set  always  has  a  generalized  directional 
derivative.  From  convex  analysis  (for  example,  see  Theorem  23.1  on  page  213 
of  Rockafellar  (1970)),  a  convex  function  always  has  a  one-sided  directional 
derivative.  The  following  theorem  says  that  these  two  derivatives  coincide 
for  a  convex  function. 


Theorem  2.8  Let  U  be  an  open  convex  set  in  Rn  and  let  f  be  a  convex 
function  on  U  with  f(x)  <  -foo  for  some  x  £  U .  Then  f  is  Lipschitz  near 
any  point  x  in  U  and  f°{x\d)  coincides  with  f'(x\d)  for  all  d  in  Rn .  Thus 
f  is  regular  on  U . 


For  composite  functions,  we  have  a  similar  result. 


Corollary  2.9  Let  U  be  an  open  convex  set  in  Rn  and  let  f  be  a  convex 
function  on  U  with  f(y)  <  +oo  for  some  y  €  C .  Let  g  :  V  C  Rm  — ►  U  C  Rn 
be  continuously  differentiable  on  V.  Then  the  composite  function  c(x)  = 
f(g(x))  is  Lipschitz  near  any  point  x  in  V  and  c°(x;  d)  coincides  with  c'(x;  d) 
for  all  d  in  Rm .  Thus  the  composite  function  is  regular  on  V. 


While  the  following  characterization  of  stationary  points  does  not  appear 
explicitly  in  Clarke’s  book,  it  is  well  known  and  quite  useful.  In  a  minimiza¬ 
tion  problem,  the  following  lemma  says  that  a  point  is  a  stationary  point  of 
a  locally  Lipschitz  function  if  and  only  if,  at  this  point,  there  do  not  exist 
any  descent  directions  for  the  function. 


Lemma  2.10  Assume  that  f  is  Lipschitz  near  x.  Then  0  £  df(x)  if  and 
only  if  f°(x;d)  >  0  for  all  d  in  Rn . 


Proof.  The  existence  of  f°(x]d)  is  guaranteed  by  by  Lemma  2.3.  The 
proof  now  follows  in  a  direct  manner.  □ 
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2.2  Assumptions 

Let  Lq  =  (x|/(x)  <  f(x0)}  denote  the  level  set  of  /  in  problem  (1)  at  the 
starting  point  x0,  and  m(x,p)'(0;  d)  (or  m(x,p)°(0;  d))  be  the  one-sided  (or 
generalized)  directional  derivative  of  m(x,p)(s)  with  respect  to  s  at  s  =  0 
along  the  direction  d  £  Rn  for  p  £  P  C  R1.  We  employ  the  following  basic 
assumptions  on  the  objective  function  and  the  TR  local  models. 

Assumption  2.1  :  The  objective  function  /  is  regular  on  Lq. 

Assumption  2.2  :  For  every  (x,p)  £  L0  x  P,  the  local  model 
m(x,p)(s)  is  regular  with  respect  to  s  £  Rn. 

Assumption  2.3  :  For(x,p)  £  LoxP,  the  local  model  m(x,p)(s) 
satisfies 

m(x,p)(0)  =  f(x) 

and 

m(x,p)°(0;d)  =  /°(x;d),  Vd  £  Rn  ~  {0}. 

Assumption  2.4  :  For  every  s  £  Rn,  the  local  model  m(x,p)(s) 
is  continuous  in  ( x,p ). 

Assumption  2.5  :  The  set  P  of  possible  parameter  vectors  is 
bounded. 


A  general  theory  cannot  accomodate  the  delicate  details  of  all  applica¬ 
tions.  In  this  sense  Assumption  2.5  is  restrictive.  For  example,  Powell  (1984) 
and  Yuan  (1983)  developed  convergence  theories  without  assuming  bounded 
parameters.  However,  as  we  shall  see,  in  many  applications  Assumption  2.5 
is  realistic. 

When  the  local  model  is  convex  in  s ,  since  convexity  implies  regularity 
by  Theorem  2.8,  Assumption  2.2  will  be  satisfied. 

Assumption  2.3  requires  every  local  model  to  be  at  least  a  first-order 
approximation  to  the  objective  function.  We  observe  that  no  uniformity  in 
(x,  p)  is  involved  in  this  assumption.  In  general,  we  are  free  in  each  TR  itera¬ 
tion  to  choose  the  parameter  vector  pk  provided  that  the  local  models  always 
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keep  first-order  approximation  to  the  objective  function  and  the  associated 
parameter  vector  pk  remains  in  the  bounded  set  P.  In  order  to  verify  As¬ 
sumption  2.3  for  a  TR  method,  we  will  need  to  know  something  about  the 
structure  of  the  local  model. 

The  locally  Lipschitz  level  of  generalization  for  nonsmooth  objective  func¬ 
tions  makes  it  possible  to  analyze  TR  methods  involving  some  nonsmooth 
composite  functions  other  than  polyhedral  norms.  For  example,  the  function 

c(  x)  =  max(0,min(xi,x^)) 

where  x  =  (xi,x2),  is  locally  Lipschitz  but  is  not  a  polyhedral  norm.  The 
TR  method  for  solving  the  optimization  problem 

min  c(F(x)) 

where  F(x)  =  X2),  f2(xi,  £2))  is  a  smooth  function,  may  use  the  local 

model,  with  first-order  approximation  in  F, 

m(x)(s)  =  c(F(x)  +  F'(x)s). 

The  resulting  TR  subproblem  can  be  converted  to  a  linear  programming 
problem  and  can  be  approximately  solved  at  each  iteration  by  a  simplex 
method  or  an  interior  point  method.  See  Li  (1989)  Chapter  5  for  details  of 
the  convergence  analysis.  In  this  paper  we  deal  only  with  regular  objective 
functions  and  we  do  not  consider  methods  for  solving  the  local  models. 


2.3  Preventing  False  Termination  of  the  TR  Algo¬ 
rithm 

The  TR  subproblem  must  be  posed  so  as  to  avoid  false  termination,  i.e.,  if 
sk  obtained  from  the  basic  TR  algorithm  in  STEP  1  with  6k  >  0  happens  to 
be  zero,  then  the  algorithm  has  converged,  be.,  xk  is  a  stationary  point  of  / 
in  (1).  We  first  show  that  obtaining  s  =  0  from  the  basic  TR  algorithm  in 
STEP  1  with  6  >  0  is  equivalent  to  the  fact  that  s*  =  0  solves  the  TR  sub- 
problem  SUB(x,p;S )  with  8  >  0,  be.,  if  the  approximate  solution  satifying 
the  two  conditions  (2)  and  (3)  is  zero,  then  so  is  the  exact  solution. 
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Lemma  2.11  Let  8  >  0  be  the  TR  radius  in  the  subproblem  SUB(x,p;8).  If 
the  step  s  obtained  from  the  basic  TR  algorithm  in  STEP  1  is  equal  to  zero, 
then  s*  =  0  solves  SUB(x,p;8). 

Proof.  Recall  that  the  step  s  obtained  from  the  basic  TR  algorithm  in 
STEP  1  satisfies  (2) 

IMI  <  <5 

and  (3) 

m(x,p)( 0)  —  m(x,p)(s)  >  r[m(x,p)(0)  -  m(x,p)(.s*)]  >  0.  (5) 

If  s  —  0,  then  m(x,p)(0)  —  m(x,p)(s*)  =  0,  which  implies  s*  =  0  solves  the 
subproblem.  □ 

We  will  now  show  that  Assumptions  2.1  through  2.3  are  sufficient  for 
preventing  false  termination. 

Lemma  2.12  Under  Assumptions  2.1  through  2.3,  if  s*  =  0  solves  the  sub¬ 
problem  SUB(x,p;8)  with  8  >  0,  then  x  is  a  stationary  point  of  f. 

Proof.  Let  x  and  p  be  fixed.  If  s“  =  0  solves  SUB(x,p-,8)  with  8  >  0, 
then  s*  =  0  is  a  local  minimizer  of  m(x,p)(s).  From  the  first-order  necessary 
condition  (Theorem  2.6),  s*  =  0  is  a  stationary  point  of  m(x,p)(s)  considered 
as  a  function  of  s.  Since  m(x,p)(s)  is  Lipschitz  in  s  near  0,  from  Lemma 
2.10,  we  have 

m(x,p)°(0;  d)  >  0,  Vd  6  Rn. 

By  Assumption  2.3, 

m(x,p)°(0;d)  =  /°(x;d),  Vd  6  Rn  ~  {0}. 

So 

/°(x;d)>  0,  Vd  G  i?n. 

From  Lemma  2.10,  0  G  df(x)  and  x  is  therefore  a  stationary  point  of  /.  □ 

Thus  under  the  assumptions  of  Lemma  2.12,  if  x  is  not  a  stationary  point 
of  the  objective  function  /  in  (1),  then  any  solution  s*  of  SUB(x,p;  8)  with 
8  >  0  will  not  be  zero.  Therefore  from  Lemma  2.11,  no  trial  step  s  obtained 
in  STEP  1  will  be  zero.  We  will  use  this  result  in  our  later  analysis. 
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2.4  Conditions  Equivalent  to  First-Order  Approxi¬ 
mation 


Assumption  2.3  requires  that  for  regular  functions,  the  one-sided  directional 
derivatives  of  the  objective  function  /  in  (1)  and  the  local  model  m(x,p)(s) 
must  coincide.  In  many  nonsmooth  cases,  it  is  more  convenient  to  check  if 


o(x,P)(s)  = 


converges  to  zero  as  ||s||  converges  to  zero,  or  if 


at-  _sujs  f(x  +  td)  ~  m(x,p)(td) 

*(x’p)(W)  = - wi - 

converges  to  zero  for  all  d  G  Rn  ~  {0}  as  t  J.  0.  The  next  lemma  shows  that 
these  three  conditions  are  equivalent  for  regular  functions. 


Lemma  2.13  Under  Assumptions  2.1  and  2.2,  for  ( x,p )  €  L0  x  P ,  the 
following  three  conditions  are  equivalent: 


1.  Assumption  2.3  : 

m{x,p)(  0)  =  f(x), 

m(x,p)°(0;d)  =  f°(x]d),  Vd  G  Rn  ~  {0}; 

2. 


lim  0(x,p)(s)  =  0; 
IMHo 


3. 


limd(x,  p)(td)  =  0 

for  all  d  ^  0  in  Rn . 

Furthermore,  any  one  of  these  conditions  implies  that,  if  the  step  s  obtained 
from  the  basic  TR  algorithm  in  STEP  1  with  6  >  0  in  SUB(x,p-,6)  is  equal 
to  zero,  then  x  is  a  stationary  point  of  f . 
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Proof.  We  first  prove  that  the  first  condition  implies  the  second  condi¬ 
tion.  Suppose  that  Assumption  2.3  holds.  If  the  limit  of  0(x,p)(s)  is  not 
zero,  then  there  must  exist  a  sequence  {sjt}  and  e  >  0  such  that  ||s*||  — ►  0 
and 

„(  _  -’"frriwi  >  £. 

M 

Let  tk  =  \\sk\\,  dk  =  Sfc/M.  Since  m(x,p)( 0)  =  f(x),  we  can  write 

6(x  p)(sk )  =  —  +  tkdk^  —  ^  ±  tkdk)  ~  m{x,p)( 0) 

tk  tk 

Since  ||(f/t||  =  1,  there  must  exist  a  subsequence  {<4,}  and  d*  such  that 
||<4J|  =  ||d*||  =  1,  dk,  — *  d„  as  i  — >  oo,  and  |0(a:,p)(sjti)|  >  £.  Let  {K'}  denote 
the  index  set  {&,■}  of  the  above  subsequence.  The  first  term  of  0(x,p)(sk, ) 
can  be  written 

f(x  +  tkdk)  ~  f{x)  _  f(x  +  tkdk)  -  f(x  +  tkdM)  f(x  +  tkdm)  -  f(x) 
tk  tk  tk 

for  k  €  {K1}.  Since  /  is  Lipschitz  near  x  from  Definition  2.7,  we  have 


f{x  +  tkl  dk, )  -  f{x  +  tk, d« ) 
!  tk. 


I  <  K\\dk,  -  d. 


where  K  is  the  Lipschitz  constant  of  /  near  x,  and 

i-OO  tki 


Thus  the  first  term  of  6(x,p)(sk,)  converges  to  /°(x;  d.)  as  i  — >  oo.  Similarly 
the  second  term  of  0(x,p)(sfcj  converges  to  m(x,p)°(0;  d.)  as  i  — >  oo.  By 
Assumption  2.3,  ro(x,p)°(0;  d„)  =  /°(x;d„).  Thus  the  limit  of  0(x,p)(ski) 
exists  and  is  equal  to  0  as  i  — ►  oo,  which  contradicts  |d(x,p)(sfc.)|  >  e. 
Therefore  the  limit  of  0(x,p)(s)  must  be  zero  as  ||s||  converges  to  zero. 


It  is  obvious  that  the  second  condition  implies  the  third  condition.  Finally 
we  will  show  that  the  third  condition  implies  the  first  condition.  Suppose 
that 

f(x  +  td)  —  m(x,p)(td) 


lim#(x,  p)(fd)  =  lim 
no  v  '  no 


1 1|4| 


=  o 
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for  all  d  in  Rn  ~  {0}.  Recalling  the  definition  of  0(x,p)(s),  we  have 
f(x  +  td)  =  m(x ,  p)(td)  -f  0(x,p)(td)  t  ||d|| 

where 

lim^(a:,p)(td)  =  0 

for  all  d  in  Rn  ~  {0}.  Since  f(x  +  td)  and  m(x,p)(td)  as  functions  of  t  are 
continuous  at  t  =  0,  by  letting  t  [  0,  we  obtain  f(x)  =  m(x,p)( 0).  It  also 
follows  from  the  above  expression  that 

f°(x;d)  =  lim  f(x  +  td)~f(x) 
v  '  UO  t 

=  lfijy[m(j; ’^)(^)  ^~  m(X  ^)(°)  +  6(x,p)(td)  ||d||] 

=  m(x,  p)°(0;  d) 
for  all  d  in  Rn  ~  {0}. 


From  Lemma  2.11  and  Lemma  2.12,  the  first-order  approximation  in  As¬ 
sumption  2.3  or  any  of  the  equivalent  conditions  also  guarantees  that  x  is 
a  stationary  point  of  /  in  (1)  when  the  step  s  obtained  from  the  basic  TR 
algorithm  in  STEP  1  with  8  >  0  in  SUB(x,p‘,  8)  is  equal  to  zero.  □ 


3  Case  Studies 


As  nonsmooth  norms  and  nonsmooth  penalty  or  merit  functions  are  widely 
employed  in  both  constrained  and  unconstrained  optimization  problems,  non- 
smooth  TR  methods  and  their  convergence  analysis  have  become  an  active 
research  area  in  recent  years.  We  will  introduce  some  nonsmooth  TR  meth¬ 
ods  in  this  section  and  show  that  the  assumptions  listed  in  Section  2  are 
reasonable  in  that  they  apply  to  these  TR  methods.  Therefore  any  conver¬ 
gence  theory  which  follows  from  these  assumptions  can  be  viewed  as  a  unified 
approach  to  global  convergence  for  TR  methods. 

We  let  V  denote  the  gradient  operator  and  V2  the  Hessian  operator  of  a 
functional  defined  in  Rn.  We  also  use  the  notation  Vc(ar)  to  denote  c'(x)T , 
the  transpose  of  the  Jacobian  matrix  at  x  for  a  function  c  :  Rn  — >  Rm. 
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3.1  The  Smooth  Problem 


The  smooth  problem 

min  f(x), 

where  /  :  Rn  — ►  R  is  assumed  to  be  continuously  differentiable,  can  be  solved 
by  the  basic  TR  algorithm  with  the  local  model 

m(xk,  Bk)(s )  =  f(xk)  +  Vf(xk)Ts  +  i sTBks , 

where  each  Bk  in  m(xk,  Bk)(s)  is  assumed  to  belong  to  a  bounded  set  P  in 
Rnxn .  Obvious  choices  for  Bk  are  Bk  =  0,  or  Bk  =  W2f(xk)  when  /  £  C 2 . 

Now  we  show  that  the  assumptions  listed  in  Section  2  are  satisfied  for 
the  local  model  m(x,  B)(s).  The  continuous  differentiability  of  /  implies  the 
regularity  of  /.  Since  m(x,  B)(s)  is  linear  or  quadratic  in  s  for  every  (x,  B), 
the  local  model  is  regular  in  s  and  Assumption  2.2  is  satisfied.  Since 

f{x)  =  m(x,£)(0), 

r(x]S)  =  vf(x)Ts, 

and 

m(x,B)°(0;s)  =  Vf(x)Ts, 

for  every  x  £  Rn  and  B  £  Rnxn,  Assumption  2.3  holds.  Assumption  2.4 
holds  because  continuity  in  (x,  B)  of  m(-,  -)(s)  follows  from  the  continuity  of 
/(•)  and  V/(-).  Assumption  2.5  holds  as  long  as  the  matrix  Bk  is  chosen 
from  a  bounded  set  P  C  RnXn . 

Therefore  any  convergence  analysis  based  on  our  assumptions  will  apply 
to  the  smooth  objective  function  /  and  the  local  model  m(x,  B)(s )  with  any 
norm  including  a  nonsmooth  norm  employed  in  the  TR  subproblem. 

Powell  (1984)  obtained  a  convergence  theorem,  which  is  similar  to  The¬ 
orem  3.2  in  Section  3.3,  for  TR  methods  for  smooth  unconstrained  mini¬ 
mization  under  weaker  assumptions  on  the  second  derivative  approximation 
Bk. 
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3.2  Fletcher’s  TR  Method 


Fletcher  (1982),  (1984),  (1987)  suggested  a  TR  method  to  solve  nonsmooth 
unconstrained  problems  of  the  form 

min<?!>(x)  =  f(x)  +  h(c(x))  (6) 

where  /  :  Rn  — *■  R  and  c  :  Rn  —>  Rm  are  twice  continuously  differentiable, 
and  h  :  Rm  — *  R  is  the  polyhedral  convex  function 

h(x )  =  ma x(hj  x  +  b{)  (7) 

where  the  vectors  h{  and  the  scalars  b{  are  given.  The  l\  and  penalty 
functions 

Pi(x)  =  /(x)  +  />||c(x)||i 
Poo(-E)  f(x)  T  Pl|<'(^')  ||oo 

with  penalty  parameter  p  are  often  used  in  solving  the  smooth  equality  con¬ 
strained  problem 


min  f(x) 
s.t.  c(x)  =  0, 

and  can  be  written  in  the  form  (6). 

The  basic  model  algorithm  and  convergence  analysis  of  Fletcher’s  TR 
method  were  given  in  Fletcher  (1982).  This  material  is  also  reviewed  in  the 
paper  Fletcher  (1984)  and  in  the  book  Fletcher  (1987). 

The  basic  local  model  employed  in  Fletcher’s  TR  algorithm  is 

m{xki  Afe)(<s)  = 

f(xk)  +  V/(x*,)rs  +  \sTB(xk,  A k)s  +  h(c(xk)  +  Vc(xfc)Ts), 

where 

m 

B(xk,  \k)  =  V2/(xfc)  +  £  AfV2c,(xfc) 

1=1 

and  Af  is  the  f-th  component  of  the  multiplier  \k  associated  with  the  previous 
TR  subproblem. 
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The  step  in  Fletcher’s  algorithm  is  accepted  if 

=  4>{*k)  ~  <f>{xk  +  Sk)  _  „ 

Tk  <t>(xk)  -  m(xjk,  \k)(sk)  °° 

A  global  solution  of  the  TR  subproblem  in  STEP  1  is  required  in  Fletcher’s 
algorithm.  The  following  global  convergence  theorem  was  given  by  Fletcher. 

Theorem  3.1  (Fletcher  (1982),  (1987))  Let  the  sequence  {x^}  generated 
by  the  TR  algorithm  be  contained  in  a  bounded  (convex)  set  D  and  let  f,  c 
be  C2  functions  whose  second  derivative  matrices  are  bounded  on  D.  Then 
there  exists  an  accumulation  point  x°°  of  the  iteration  sequence  such  that  x°° 
is  a  stationary  point  of  <f>. 


We  will  show  that  the  objective  function  <t>(x)  given  by  (6)  is  regular  and 
Assumptions  2.2  through  2.5  hold  for  the  local  model  m(xjt,  A*)(s)  under  the 
assumptions  made  by  Fletcher  in  Theorem  3.1. 

Since  h  is  convex,  by  Corollary  2.9,  <f>(x)  given  by  (6)  is  regular  on  D  and 
m(x,A)(s)  is  regular  in  s  for  every  (x,  A)  e  D  x  Rm ,  i.e.,  Assumptions  2.1 
and  2.2  hold. 

From  Lemma  2.13,  we  may  verify  Assumption  2.3  by  checking  if 

fl(x,A)(S)  =  ^  +  3)-m(l-A)W 


converges  to  0  as  ||s||  converges  to  0  for  every  (x,A)  €  D  x  Rm.  From  the 
Taylor  expansion  for  /  at  x, 

f(x  +  .)  =  f(x)  +  V  f(x)Ts  +  iSTV7(f  )i 

where  £  — »  x  as  ||sj|  — ►  0.  We  have 


0(x,  A) 


■jm rrt/Y-,1  s  ,  Mt(l+s))-Mt(i)+Vc(jfj) 

2  1  ni>  ~  1,1  PI +  Pli 


i  m 
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Since  /  is  twice  continuously  differentiable  on  D,  the  first  term  of  9(x ,  A)(s) 

bT[V2/(?)  -  V2/(*)l  ^7 

2  ||s|| 

converges  to  0  as  ||s||  — ►  0.  From  Theorem  2.8,  the  convex  function  h  is 
Lipschitz  near  x  with  Lipschitz  constant  K,  which  implies  that  the  second 
term  of  6(x,  A)(s) 

| h(c(x  +  s))  —  h(c(x)  +  Vc(x)T.s)|  ^  ||c(ar  +  s)  —  c(x)  —  Vc(x)Ts|| 

R  ~  R 

converges  to  0  for  every  x  £  D  as  ||s||  — >  0  because  of  the  Frechet  differen¬ 
tiability  of  the  function  c.  The  third  term  of  9(x,  A)(s) 


-  KsTV2Ci(x) 

z  t=i 


converges  to  0  as  \\s\\  — >  0  for  every  (x,\)  6  D  x  Rm.  Therefore  Assumption 
2.3  holds. 


Since  /  and  c  are  twice  continuously  differentiable  and  the  convex  function 
h  is  continuous,  the  local  model  m(x1X)(s)  is  continuous  in  (a:,  A)  for  every 
s  in  Rn ,  be.,  Assumption  2.4  holds. 

Under  the  above  assumptions,  Fletcher  proved  that  the  multiplier  A*"  is 
uniformly  bounded  in  k.  For  example,  see  (1.6)  in  Fletcher  (1984)  or  Lemma 

14.2.1  in  Fletcher  (1987).  This  says  that  the  parameter  vectors  Xk  are  chosen 
from  a  bounded  set,  be.,  Assumption  2.5  holds.  Fletcher  was  aware  that  his 
convergence  theory  still  holds  if  the  matrices  B(xk,Xk)  in  (8)  are  replaced 
with  matrices  B k  belonging  to  a  bounded  set.  See  Section  4  of  Fletcher 
(1984)  for  example.  This  is  also  an  implication  of  our  unified  theory. 

Notice  that  we  only  deal  with  the  case  that  r(x,p)(s)  >  cq  >  0  in  this 
paper,  be.,  the  TR  step  is  accepted  when  the  ratio  is  greater  than  a  positive 
constant  Co.  The  TR  step  is  accepted  in  Fletcher’s  TR  algorithm  when  the 
ratio  r(x,p)(s)  >  0,  be.,  c0  =  0.  Fletcher’s  convergence  result  in  Theorem 

3.1  says  that  there  exists  an  accumulation  point  of  the  TR  iterates  which  is  a 
stationary  point  of  the  objective  function.  The  convergence  result  obtained 
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in  this  paper  for  the  slightly  more  restrictive  acceptance  test  is  the  stronger 
result  that  any  accumulation  point  of  the  TR  iteration  sequence  is  a  station¬ 
ary  point  of  the  objective  function.  Hence  by  making  a  slight  modification 
of  Fletcher’s  algorithm  we  establish  a  stronger  result. 


3.3  Powell  and  Yuan’s  TR  Method 

Powell  (1983)  considered  a  TR  method  to  solve 

min  h(F(x)), 

where  F(x)  =  (/i(x), ...,  fm(x))T  :  Rn  — >  Rm  is  continuously  differentiable 
and  h  :  Rm  —+  R  is  convex.  He  used  the  local  model 

m(xk,  Bk)(s)  =  h(F(xk)  +  F'(xk)s)  +  \>sT Bks 

where  Bk  is  an  n  x  n  symmetric  matrix.  This  general  type  of  nonlinear 
problem  includes  the  minmax  problem 

mjn  max  |/i(x)|  =  ||F(x)||oo, 

X  I 

the  nonlinear  l\  problem 

m 

mjn£l/*(*)l  =  II^OIIi* 

1=1 

and  the  nonlinear  least-squares  problem 

m 

=  ||^(®)||2- 

i=i 

Yuan  (1983)  proved  the  following  global  convergence  result  for  Powell’s  TR 
method. 

Theorem  3.2  (Yuan  (1983))  Assume  that  F  is  continuously  differentiable 
and  h  is  convex  in  RJ1 .  If  h(F(x))  is  bounded  below,  the  sequence  {x^} 
generated  by  the  TR  algorithm  is  bounded,  and  if  the  inequality 

||  Bk  || <  c3  +  c4k, 

where  03,04  are  positive  constants,  holds  for  all  k,  then  there  exists  an  accu¬ 
mulation  point  of  {xjfe}  which  is  a  stationary  point  of  h(F(x)). 
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We  only  consider  the  case  when  Bk  is  bounded  for  all  k,  so  that  the 
local  model  m(x,B)(s)  satisfies  Assumption  2.5.  Since  h  is  convex  and  F  is 
continuously  differentiable,  by  Corollary  2.9,  the  composite  function  h(F(x)) 
is  regular  on  Rn  and  h(F(x)  +  F'(x)s)  is  regular  in  s  for  every  x  E  Rn .  Since 
the  second  term  of  m(x,B)(s)  is  differentiable  in  s  for  every  B  E  Rnxn ,  the 
local  model  m(x,B)(s)  is  regular  in  s  for  every  (x,B)  E  ( Rn  x  RnXn),  he., 
Assumptions  2.1  and  2.2  hold. 


We  verify  Assumption  2.3  by  considering  the  second  equivalent  form  in 
Lemma  2.13.  From  the  Frechet  differentiability  of  F,  we  have 

F(x  +  s)  =  F(x)  +  F'(x)s  +  cr(x,  s)||s|| 


where 


cr(x,  s) 


a 


F(x  +  s)  —  F(x)  —  F'(x)s 

Ni 


and 

lim  II cr(ar,  s)||  =  0 
ll*ll—o 

for  every  x  E  Rn-  Thus, 

h(F(x  -f  s))  —  m(x,  B)(s)  = 

h(F(x)  +  F'(x)s  +  a(x,s)||s||)  -  h{F{x)  +  F’{x)s)  -  \sTBs. 

£ 


The  convexity  of  h  implies  from  Theorem  2.8  that  h  is  Lipschitz  near  F(x)  + 
F\x)s  with  constant  K ,  so  that 

I«(*.-b)WI  =  <  Arn<r(x,s)||  +  i||s||||fl|| 


for  every  (x,B)  E  Rn  x  RnXn.  Therefore 

lim  9(x ,  B)(s )  =  0 

ll*ll-o 

for  every  ( x,B )  E  Rn  x  RnXn.  By  Lemma  2.13,  Assumption  2.3  holds. 


The  local  model  m(x,B)(s)  is  continuous  in  (x,B)  for  every  s  €  Rn 
because  of  the  continuous  differentiability  of  F  and  the  continuity  of  the 
convex  function  h.  Thus  Assumption  2.4  holds. 
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We  have  demonstrated  that  Powell  and  Yuan’s  TR  method  satisfies  As¬ 
sumption  2.1  through  Assumption  2.5  under  the  assumptions  of  Theorem  3.2 
and  the  additional  assumption  that  {Z^}  is  bounded.  This  allows  us  to  use 
Theorem  4.3  to  prove  that  any  accumulation  point  of  the  TR  iterates  is  a  sta¬ 
tionary  point  of  the  objective  function  h(F(x)).  Theorem  3.2  says  that  there 
exists  an  accumulation  point  of  the  TR  iterates  which  is  a  stationary  point 
of  the  objective  function  h(F(x)).  Thus  we  make  a  stronger  assumption  and 
derive  a  stronger  conclusion. 


3.4  Zhang,  Kim  and  Lasdon’s  Successive  Linear  Pro¬ 
gramming  Method 

Zhang,  Kim  and  Lasdon  (1985)  suggested  a  TR  algorithm  for  solving  the 
constrained  problem 

min  h0(x) 

s.t.  hi(x)  =  0,  i  =  l,...,k, 

hj(x )  <0,  j  =  k  +  1, ...,  m, 

where  ht  :  Rn  — ►  R,  i  =  0,1,..., m,  are  continuously  differentiable.  They 
solve  the  constrained  problem  by  minimizing  an  l\  penalty  function 

k  -  m 

f(x)  =  h0(x)  +  Ylw'  I  hi{x)  |  +  ^2  w'  max{0,hi(x)), 

>=1  i=k+l 

where  Wi,i  =  1,  are  penalty  parameters.  The  local  model 

k 

m(x)(s)  =  h0(x)  +  X/h0(x)Ts  +  ^  w{  |  hi(x)  +  Vhi(x)Ts  \  + 

i= 1 

m 

T.  Wi  max(0,hi(x) +  'Vhi(x)Ts) 

i=k+ 1 

is  employed  in  the  TR  subproblem 

min  m(x)(s) 
s.t.  Halloo  <  6. 
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This  subproblem  can  be  transformed  into  a  linear  programming  (LP)  prob¬ 
lem.  Thus  they  compute  a  stationary  point  of  the  penalty  function  f(x)  by 
solving  a  sequence  of  LPs;  hence  their  algorithm  can  be  viewed  as  a  succes¬ 
sive  linear  programming  (SLP)  method.  They  obtained  the  following  global 
convergence  theorem. 


Theorem  3.3  (Zhang,  Kim  and  Lasdon  (1985))  If  hit  i  =  0,1,..., m, 
are  continuously  differentiable  and  the  level  set  of  f(x)  at  the  initial  point  of 
the  algorithm  is  bounded,  then  the  sequence  {a;*,}  of  TR  iterates  has  accumu¬ 
lation  points,  and  every  accumulation  point  of  {xk}  is  a  stationary  point  of 
f(x).  Furthermore,  if  an  accumulation  point  is  feasible  for  the  constrained 
problem,  then  it  is  a  Kuhn-Tucker  point  of  the  constrained  problem. 


We  now  show  that  this  TR  method  satisfies  Assumptions  2.1  through 
Assumption  2.4  under  the  assumptions  made  by  Zhang,  Kim  and  Lasdon 
(1985).  Assumption  2.5  is  not  applicable  in  this  case  because  no  parameter 
vector  is  involved  in  the  local  model  m(x)(s).  Here  the  penalty  parameters  ie, 
are  considered  to  be  constants  and  are  not  changed  iteration  by  iteration  in 
the  TR  method  since  they  appear  in  both  the  objective  function  and  the  local 
models.  Since  every  /i,(x)  is  continuously  differentiable  and  the  functions  \t\, 
max( 0,  t)  are  convex,  by  Corollary  2.9,  the  composite  function  f(x)  is  regular 
in  Rn.  Also  m(x)(s)  is  convex  in  s  for  every  x  G  Rn  because  of  the  convexity 
of  the  functions  |f|  and  mczx(0,f).  Hence  Assumptions  2.1  and  2.2  hold  by 
Corollary  2.9.  The  continuous  differentiability  of  hi  and  the  continuity  of 
the  functions  |f|  and  max(0,t)  imply  that  m(x)(s)  is  continuous  in  x  for  all 
s  €  Rn,  i.e.,  Assumption  2.4  holds. 

We  need  to  show  that  Assumption  2.3  holds.  From  the  Frechet  differen¬ 
tiability  of  hi,  we  have 

hfx  +  s)  =  hi(x)  -I-  Vhi(x)Ts  +  <t,(x,s)||s|| 
i  =  0, 1, . . . ,  m 

where 

,  ,  a  hi(x  + s)  -  hi(x)  -  V hi(x)T s 
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as  ||s||  — *■  0  for  every  x  6  Rn.  The  inequalities 

|  \hi(x  +  s)\-\hi(x) +  Vhi(x)Ts\  |  <  |<7,-(x,s)|||s|| 
i  =  1 , ,k 

and 

|max(0,  hi(x  +  s))  —  max( 0,  h;(x)  +  V/^(x)T.s)|  <  |cr,(:r,  s)|||5| 

i  =  k  +  1 , . . . ,  m 

imply  that 

\f(x  +  s)  —  m(x)(s) 


which  shows  that 


<  |<To(x,s)|  + 

t=i 


as  |M|  — ♦  0  for  every  x  €  Rn.  By  Lemma  2.13,  Assumption  2.3  holds. 


3.5  Duff,  Nocedal  and  Reid’s  TR  Method 

Duff,  Nocedal  and  Reid  (1987)  suggested  a  TR  method  to  solve  a  system  of 
nonlinear  equations 

F(x)  =  0 

where  F  :  Rn  — ►  Rn  is  continuously  differentiable.  As  a  globalization  strategy 
for  a  locally  convergent  method,  for  example,  Newton’s  method  or  a  secant 
method,  they  solve  the  unconstrained  nonsmooth  problem 

min  f(x)  =  ||F(a:)||i, 

by  a  TR  method  with  subproblem 

min  m(xk){s)  =  ||F(xfc)  +  F/(xA:)s||i 

s.t.  IMIoo  <  6k- 
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They  pointed  out  that  in  this  way  one  can  use  LP  techniques  to  solve  the  TR 
subproblem  in  each  iteration,  and  therefore  take  advantage  of  any  sparsity 
patterns  in  the  Jacobian  F'(xk)  more  readily  than  in  an  /2  TR  method. 
Instead  of  the  ratio  test  in  the  basic  TR  algorithm,  they  employed  a  sufficient 
decrease  condition 


||.F(:e  +  s)||i  <  ||F(x)||i  —  a  ||i?/(i)s||1, 

where  1  >  a  >  0  is  a  fixed  constant,  to  accept  or  reject  the  new  iterate  and 
used  the  ratio  r*  as  a  basis  for  reducing  or  increasing  the  TR  radius. 

Duff,  Nocedal  and  Reid  (1987)  did  not  give  a  convergence  result  and 
pointed  out  that  their  approach  of  updating  the  TR  radius  is  open  to  im¬ 
provement.  Their  method  is  considered  in  this  paper  to  be  a  special  case 
of  El  Hallabi  and  Tapia’s  TR  method.  Therefore  the  discussion  on  the  as¬ 
sumptions  and  the  convergence  for  their  method  is  contained  in  the  next 
subsection. 


3.6  El  Hallabi  and  Tapia’s  TR  Method 

El  Hallabi  and  Tapia  (1987)  analyze  an  arbitrary  norm  TR  method  for  solving 
a  system  of  nonlinear  equations 


F{x)  =  0, 

where  F  :  Rn  —>  Rn  is  continuously  differentiable.  To  obtain  a  solution,  they 
solve  the  unconstrained  nonsmooth  problem 

min  f(x)  =  ||F(x)||, 

by  a  TR  algorithm  as  a  globalization  strategy  with  the  local  model 

m(x)(s)  =  ||F(x)  +  F'(x)s||. 

Different  norms  are  allowed  in  the  various  parts  of  the  subproblem 

min  m(x*)(s)  =  ||F(x*)  +  F'(xfc)a||a 
s.t.  |M|6  <  6k 
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where  ||  •  ||a  and  ||  •  ||(,  are  any  two  norms  on  Rn.  In  their  algorithm,  sk  is  a 
solution  of  the  subproblem  in  STEP  1  ,  and  6k  is  reduced  in  STEP  3  until  a 
sufficient  decrease  condition 

f(*k  +  sk)  <  f(xk)  +  a  ~t(xk,  sk) 
is  satisfied,  where  the  choice  for  7(2;  s),  for  example,  may  be 

l(x> s)  =  II^X*)  +  ^r/(ar)*s|U  ~  l!^’(a;)l|a  =  m(x)(s)  -  m(x)(0). 

Thus,  if  7(x,s)  <  0,  then 

f(x  +  s)  <  /(x)  +  a  7(x,  s) 


which  is  equivalent  to 


J  V  1  /  J  \  >  >  Q 

m(x)(s)  —  m(x)(0)  — 
the  standard  TR  ratio  test. 

It  is  worth  mentioning  that  El  Hallabi  and  Tapia  (1987)  established  the 
inequalities 

~f3(x,s)  <  72 (z,s)  <  7i(x,s) 
for  the  three  choices  of  7(x,s) 

71(x,5)  =  ||F(x)  +  F'(x)5||-||F(x)|| 

72(x,s)  = 

73(x,s)  =  -\\F'(x)s\\, 

which  means  that  the  decreases  q|7,(x,s)|  required  by  the  algorithm  in  each 
iteration  with  these  three  different  choices  of  7(x,s)  satisfy  the  inequalities 

a|73(z,s)|  >  q|72(x,5)|  >  ai7i(z,s)|  >  0. 

Clearly,  73  corresponds  to  Duff,  Nocedal  and  Reid’s  test.  Also  7i(x,s)  is  the 
preferred  choice  among  the  three  in  the  sense  that  it  asks  the  least  decrease 
in  each  iteration  and  is  most  easily  satisfied. 
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El  Hallabi  and  Tapia  (1987)  proved  the  following  global  convergence  the¬ 
orem  for  their  TR  algorithm  with  various  upper  semi-continuous  7(2,  s)  with 
respect  to  (x,  5)  in  the  sufficient  decrease  condition.  Their  theory  handled  the 
choice  71,  a  modified  form  of  72,  and  could  not  handle  73.  They  conjectured 
that  the  choices  72  and  73  do  not  lead  to  globally  convergent  algorithms. 

Theorem  3.4  (El  Hallabi  and  Tapia  (1987))  If  F  is  continuously  dif¬ 
ferentiable  and  the  level  set  of  f  at  the  initial  point  is  bounded,  then  any 
accumulation  point  of  the  sequence  {re*}  generated  by  the  TR  algorithm  is  a 
stationary  point  of  f. 


Since  the  different  norms  in  the  subproblem  do  not  cause  any  difficulty 
in  the  global  convergence  we  omit  the  subscripts  on  the  norms  in  the  rest  of 
this  section.  El  Hallabi  and  Tapia  (1987)  were  aware  that  global  convergence 
can  also  be  obtained  without  any  difficulty  if  the  function  F  in  the  problem 
is  a  mapping  from  Rn  to  Rm  where  m  <  n.  Hence  consider  F  :  Rn  — ►  Rm 
in  the  rest  of  this  section.  We  will  show  that  Assumptions  2.1  through  2.4 
hold.  Thus  the  convergence  analysis  given  in  §4.1  and  §4.2  for  the  basic 
TR  algorithm  applies  to  El  Hallabi  and  Tapia’s  TR  method  under  their 
assumptions. 

Since  the  norm  is  a  convex  function  and  F  is  continuously  differentiable, 
from  Corollary  2.9,  the  composite  function  /  is  regular.  The  convexity  of 
the  norms  also  implies  that  m(a:)(s)  is  regular  in  s  for  every  x  in  Rn ,  be., 
Assumptions  2.1  and  2.2  hold.  It  is  easy  to  see  that  the  local  model  m(x)(s) 
is  continuous  in  x  for  every  s  €  Rn,  be.,  Assumption  2.4  holds.  The  Frechet 
differentiability  of  F  implies  that 

F(x  +  5)  =  F(x)  +  F'(x)s  +  cr(x,  s)||s|| 


where 

\  a  F(x  +  a)  —  F(x)  —  F'(x)s 

= - PI - *° 

as  || 5 1|  — >  0  for  every  x  €  Rn.  Thus 

|/(x  +  s)  -  m(x)(s)|  <  ||F(x  +  s)  -  [F(x)  +  F'(x)s]||  =  ||<r(x,s)||||s||. 
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It  follows  that 


I f(x  +  s)~  m(x)(s)| 


<  Mx,s)||  ->  0 


as  \\s\\  — >  0  for  every  x  <E  Rn.  From  Lemma  2.13,  Assumption  2.3  holds. 
Assumption  2.5  is  not  applicable  in  this  case  because  no  parameter  vector 
is  involved  in  the  local  model  m(x)(s).  We  have  verified  Assumption  2.1 
through  Assumption  2.4  for  the  local  model  under  the  assumptions  made  by 
El  Hallabi  and  Tapia  (1987). 


4  Convergence  Analysis 


Recall  that  the  approximate  solution  sk  of  the  subproblem  SUB(xk,pk;8k) 
obtained  in  STEP  1  of  the  basic  TR  algorithm  satisfies  conditions  (2)  and 
(3).  Since  an  exact  solution  s £  satisfies  the  above  conditions  with  r  =  1,  the 
convergence  results  obtained  in  this  section  can  be  applied  to  a  TR  iteration 
that  uses  the  exact  solution  in  STEP  1. 

The  following  theorem  says  that  if  the  current  TR  iterate  xk  is  not  a 
stationary  point  of  /,  then  there  must  exist  a  small  TR  radius  8  and  a 
neighborhood  Nk  of  xk  such  that  the  ratio  r(x,p)(s)  >  Co  for  any  x  G  Nk  and 
0  <  8  <  8  where  s  satisfies  the  two  conditions  (2)  and  (3)  for  the  subproblem 
SU B(x,  p\  8)  and  Co  €  (0, 1).  Thus  if  the  basic  TR  algorithm  loops  infinitely 
often  between  STEP  1  and  STEP  3  with  xk+j  =  xk  and  pk+j  =  pk  for  j  >  0, 
i.e.,  no  such  8  and  Nk  exist,  then  xk  must  be  a  stationary  point  of  /. 


Theorem  4.1  Let  ( x,p )  €  To  x  P  be  given  and  let  cq  be  in  (0,1).  Under 
Assumptions  2.1  through  2.5,  if  x  is  not  a  stationary  point  of  f,  then  there 
exist  8  >  0  and  s  >  0,  such  that,  for  every  x  and  8  satisfying  ||x  —  x||  <  e 
and  0  <  8  <  8 ,  we  have 


r(x,p)(s) 


/(x)  -  /(x  +  a) 
/(x)  -  m(x,p)(s) 


for  any  s  obtained  from  STEP  1  of  the  basic  TR  algorithm  for  the  subproblem 
SU B(x,p;  8). 
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Proof.  Since  x  is  not  a  stationary  point  of  /,  we  know  from  Lemma  2.10 
that  there  must  exist  a  direction  d  £  f?"  with  ||d||  =  1  such  that  f°(x;  d)  <  0. 
Let  j]  =  — /°(x;d)  >  0.  By  the  definition  of  the  generalized  directional 
derivative, 

Cot~  ^  f(x  +  6d)-f(x) 

f  (x;  d)  =  limsup - - -  =  —  tj  <  0. 

x—*x,  0 


Hence  there  exist  e  >  0  and  8  >  0  such  that 

f(x  +  Sd)  -  f(x) 


i.e., 


<-l 

2’ 


ared(x,8d)  =  f(x)  —  f(x  +  8d)  >  -jr  8 


(9) 


for  any  x  and  8  that  satisfies  ||x  —  x||  <  e  and  0  <  8  <  8. 

From  Lemma  2.13,  Assumption  2.3  implies  that  for  every  (x,p)  £  Lq  x  P 
and,  in  particular,  for  every  x  £  N(e)  =  {x  |  ||x  —  x||  <  e}, 

f(x  +  s)  =  m(x,p){s)  +  0(x,p)(s)||s|| 

where 

lim  0(x,p)(s)  =  0. 

It  follows  that  the  actual  reduction  for  any  s  with  ||s||  <  8  can  be  written  as 


ared(x,s)  —  f(x)  —  f(x  +  s) 

=  f(x)  -m(x,p)(s)  -  0(x,p)(s)||s|| 
=  pred(x,p)(s)  —  0(x,p)(s)||s||, 


where  pred(x,p)(s)  =  m(x,p)( 0)  —  m(x,p)(s)  =  f(x)  —  m(x,p)(s)  is  the 
predicted  reduction.  Thus 


/  w  x  ared(x,s)  _  g(x,p)(a)l|a|| 
’  pred(x,  p)(s)  pred(x,p)(s ) 


(10) 


for  every  x  £  N(e).  Since  s  =  8d  is  feasible  for  SUB(x,p ;  (5),  it  follows  from 
(9)  that 

~  8  <  ared(x,8d)  =  pred(x,p)(8d)  —  6(x,p)(8d)8  (11) 
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for  every  x  €  N(e)  and  0  <  8  <  8. 

Let  s*  be  an  exact  solution  of  SUB(x,p-,8).  Then 

m(x,p)(s*)  <  m(x,p)(8d) 

which  implies  that 


pred(x,p)(s*)  >  pred(x,p)(8d) 

for  every  ( x,p )  €  N(e)  and  0  <  6  <  8.  From  (11),  we  have 

V 

pred(x,p)(s*)  >  pred(x,p)(8d)  >  8+0(x, p)(8d)  for  8  >  0  sufficiently  small. 

Since  the  right-hand  side  of  the  inequality 

8  8  _  2 

pred(x,p)(s*)  <  ^8  +  0(x,p)(8d)8  r/ +  2 0(x,p)(8d) 

tends  to  the  constant  2/rj  as  8  tends  to  0,  it  follows  that 


pred(x,p)(s*)  rj 

for  small  8.  For  any  s  obtained  from  STEP  1  of  the  basic  TR  algorithm  for 
the  subproblem  SUB(x7p',8),  (10)  can  be  written  as 


r(xiP)(s)  =  1  -  0(x,p)(s) 


8 

pred(x,p)(s *) 


f(x)  -  m(s,p)(s*) 
f{x)  ~  m(x,p){s)  ' 


By  conditions  (2)  and  (3)  satisfied  by  s,  we  have 


(12) 


f(x)  -  m(x,p)(s*)  <  1 
f(x)  -  m(x,p)(s)  ~  t 


and 


Since  the  product  in  the  expression  (12)  converges  to  0  as  8  converges  to  0, 
it  follows  that  r(x,p)(s)  — ►  1  as  8  — >  0.  Therefore,  under  assumptions  2.1 
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through  2.5,  the  ratio  r(x,p)(s)  can  be  made  arbitrarily  close  to  1  if  £  and  8 
are  sufficiently  small.  Thus  we  have  r(x,p)(s)  >  Co  for  every  x  £  N(e)  and 
0  <  8  <  8,  where  cq  £  (0, 1)  and  s  satisfies  the  two  conditions  (2)  and  (3)  for 
SUB(x,p-,  8).  □ 

To  put  the  global  convergence  theorem  on  a  solid  basis,  we  specify  the 
updating  rule  for  the  TR  radius  8  in  STEP  3  of  the  basic  TR  algorithm.  Let 
0  <  Co  <  ci  <  1  and  0  <  po  <  1  <  p\  be  given  constants.  At  the  fc-th 
iteration,  the  basic  TR  algorithm  becomes 


STEP  1  approximately  solve  SUB(xk,pi t;  8k)  to  obtain  s*  satisfying  (2)  and 

(3); 

STEP  2  compute  the  ratio  r*  according  to  (4); 

STEP  3  if  rk  <  cq,  let  xk+i  :=  xk,  Pk+i  :=  Pk,  4+1  :=  Po8k  and  go  to 
STEP  1: 

otherwise,  let  Xk+ 1  :=  Xk  +  Sk,  update  pk  to  pjt+i  and  update  4  to  4+T 

c  ._  /  4  if  Co  <  rk  <  Ci, 

k+1  '  \  Pi8k  if  Ci  <  rk. 


In  the  successful  TR  iterations  where  r*  >  Co,  we  set  Xk+i  =  Xk  +  Sk  and 
Sk  0.  In  the  unsuccessful  TR  iterations  where  rk  <  Co,  we  set  Xk+i  =  Xk 
and  reduce  the  TR  radius  8k  by  po4-  The  feature  of  the  above  updating 
strategy  is  that  the  next  trial  radius  4+i  is  not  reduced  in  the  successful 
iterations,  but  that  4  approaches  0  for  an  infinite  sequence  of  unsuccessful 
steps.  Thus  the  strategy  to  update  the  TR  radius  employed  in  STEP  3  is 


4+i 


Po4) 
(  4, 

.  Pi4, 


if  rk  <  Co, 

if  co  <  rk  <  Ci, 
if  ci  <  rk. 


The  following  lemma  is  needed  in  the  proof  of  the  convergence  theorem. 


Lemma  4.2  Let  {(aifc,pfc)}  be  a  sequence  generated  by  the  basic  TR  algo¬ 
rithm  with  the  updating  strategy  specified  above  and  let  x*  be  an  accumula¬ 
tion  point  of  {x/c}  where  x*  is  not  a  stationary  point  of  f  in  (1).  Under 
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Assumptions  2.1  through  2.5,  for  every  convergent  subsequence  xk,  — >  x*  in 
{xfc}  with  the  ratio  r(xki,Pk,)(sk, )  >  Co  >  0,  we  have 

liminf  8ki  >  0, 

*— »oo 

i.e.,  there  exists  f3  >  0  such  that  8k,  >  fi  for  i  large  enough. 


Proof.  Since  the  iterates  remain  the  same  in  the  unsuccessful  iterations, 
we  cancel  the  repeated  unsuccessful  iterates  in  the  sequence  {(xfc,pfc)}  so  that 
the  sequence  {(a:*, P*)}  only  consists  of  successful  iterates.  Let  tk  >  0  denote 
the  number  of  TR  radius  reductions  to  get  the  successful  iterate  Xk,  be.,  we 
have  tk  unsuccessful  iterations  before  we  obtain  the  successful  iterate  Xk .  The 
superscript  m  is  used  to  stand  for  the  m-th  radius  reduction.  According  to 
the  update  for  the  next  radius, 

4“+1)  = />o4m|.  m  =  o . tk- 1, 

h  =  4“’, 

and 

4m)  <  co,  m  =  -  1, 

rk  =  r[tk)  >  Co. 

In  each  successful  iteration,  we  have  an  inital  trial  radius  for  the  next  iteration 

£(o)  _  f  fa,  if  Co  <  rk  <  cj, 

k+1  \  p\6k,  if  ci  <  rk. 

Hence  the  inequality 

4«  >  ik  (13) 

holds  for  successful  iterations.  This  reorganization  of  the  sequence  {(£fc,pjt)} 
makes  no  change  to  the  subsequence  {(x/t, ,pki)}  because  it  only  consists  of 
successful  iterates.  Let  K\  =  {0,1,2,...}  D  K2  = 

From  Theorem  4.1,  there  exist  a  neighborhood  N *  =  N(x*;e*)  =  {x|||x  — 
x* ||  <  £*}  and  a  constant  6 *  >  0,  such  that  r(x,p)(s)  >  Co  for  every  x  G  N* 
and  0  <  8  <  8*,  where  s  satisfies  the  two  conditions  (2)  and  (3)  for  the 
subproblem  SU B{x,p\8).  For  k  G  K2  large  enough  that  xk  G  Ar(x*;e*),  if 
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the  initial  trial  radius  8 [0)  >  8*  for  every  k  £  I< 2,  then  4°*  may  need  to  be 
reduced  tk  times  until  r*  =  rj^  >  cq.  Since  r^k  1  <  cq  and  8^k  **  >  8 *,  the 
radius 

4  =  4‘*>  =  PoSi^  >  p08 *  >  0 

is  bounded  below  by  a  positive  number  for  every  k  £  K 2  large  enough. 
Otherwise  there  exists  a  thinner  subsequence  {x*J  for  k  £  K3  C  K 2  such 
that  the  initial  trial  radius  8 [°*  <  6*  for  every  k  £  K 3  C  /C2  large  enough. 
By  Theorem  4.1,  r[0^  >  cq  for  k  £  K3.  Hence  no  reduction  in  the  radius  is 
needed,  he.,  tk  =  0  and  8k  =  4°'  f°r  every  k  £  I(3.  We  omit  the  superscripts 
on  8k  and  for  k  £  K3  in  the  rest  of  the  proof. 


Suppose 


lim  8k  =  0. 
k€K3 


From  Theorem  4.1  for  each  constant  cx  >  Co,  there  exist  a  neighborhood 
N**  =  N(x’-,e**)  =  {x|||x  —  x*||  <  £*’}  C  N*  and  a  constant  8 **  >  0  with 
0  <  £**  <  £*  and  0  <  8**  <  8*,  such  that  r(x,p)(s)  >  cx  for  every  (x,p)  £  N ** 
and  0  <  8  <  8 **,  where  s  satisfies  (2)  and  (3)  for  SUB(x,p]8).  For  k  £  K3 
large  enough,  we  have  Xk  €  N**  and  8k  <  8mm. 


Let  K3  —  ...,ij, ...}.  Consider  the  TR  iterations  between  ij-i  £  K3 

and  ij  £  K3  for  j  large  enough  that 


Xi,  €  N”,  8i}  <  8 **, 

he.,  the  ones  between  two  successive  iterates  in  the  subsequence  K3.  If  the 
last  iterate  xtj_i  before  x^  is  not  in  N **,  he., 

llx.,-1  -  x*||  >  £**, 

since  x^  is  very  close  to  x*  for  j  large  enough  that 

IK  -  *1  <  \<r. 


then,  by  (13), 


8i}  =  4J°)  >  ^-1  >  ll^-ill  =  IK  -  xb-il 

>  ||x,'y_i  -  x*||  -  || x*  -  xtj  ||  >  ~£**, 
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which  contradicts  8k  — >  0  for  k  E  K3  as  k  — *■  00.  Hence  xtj_i  E  N **  for  j 
large  enough. 

Assume  that  the  last  qj  >  1  iterates  of  Xi}  are  in  N**,  he., 

®ij  — 1  j  •••>  qj  £  N 

and  Xij-qj-i  is  outside  N**  for  j  large  enough  that 

8h  <  8". 


According  to  the  updating  strategy,  (13)  becomes 

^ ij-mt  m  =  !»•••>  ?.?• 

Again,  if  some  of  these  radii  are  reduced,  say  1  >  m0  >  qj ,  which  is  the 
largest  index  among  them,  he., 


then  similarly 
Therefore 


8iJ-m0  <  i 

&ij—m0  >  Pq8*  . 


e<,  > =  $>«  £  -  £  a  s  *>«’, 


which  contradicts  — »  0  for  A;  €  A3  as  k  — ►  00.  Thus  we  only  need  to 

consider  the  worst  case 


8ij—m  j—mi  Ivi?;! 

which  means  that  the  initial  trial  radius  in  these  iterations  is  so  small  that 
no  reduction  is  necessary.  We  also  omit  the  superscripts  in  these  iterations. 
Since  8{}  <  8 **  and 


8ij—m+l  ^  8iJ—m ,  TTL  —  1,...,?;, 

we  have 

8i}-m<8 **,  m  =  0,1,...,  qj. 
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From  Theorem  4.1, 


r.v, — m  £1 »  1)  9j- 

According  to  the  updating  strategy, 

^ij-m+l  Pl&ij—m  j  m  = 

which  implies 

r 

Oij-m  —  m  i  m  —  1 1  Qj 

Pi 


with  p\  >  1.  Since,  by  (13), 


we  also  have 


j-i) 

c.  < 

%-qj-t  S  ~aj- 

Pi 


Thus 


<?j+i 


»iT1  ^  A.  A.  oo  1  1 

E  <  E  £  +  jb  <  a.,  E  (7)-  -  M.<TrT- 

m=0  m=0  el  ^1  m=0  el  1 

However,  since  is  not  in  N*m,  he., 

IN ”*11  > 

and  j  is  large  enough  that 


kb  -  *11  <  ^*1 


we  have 


9  j+l  9j+l  9;  +  l 

EV»  £  E  IK—II  >  II E 

m=0  m=0  m=0 


lXi> 


Therefore 


>  |kb_g>-i  -  *11  -  I k,-  -  *1|  >  -£**. 

>  k-o  -  £>. 


which  contradicts  6k  — ►  0  for  A;  €  K3  as  &  — »  00.  This  completes  the  proof. 

□ 


We  now  present  our  global  convergence  theorem. 
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Theorem  4.3  Under  Assumptions  2.1  through  2.5,  if  x*  is  an  accumulation 
point  of  {xfc}  generated  by  the  basic  TR  algorithm,  then  x*  is  a  stationary 
point  of  f . 

Proof.  Suppose  that  a  subseqence  {2^}  approaches  x *  which  is  not  a 
stationary  point  of  /.  An  unsuccessful  iterate  in  the  subseqence  can  be  sub¬ 
stituted  by  the  same  successful  iterate  because  unsuccessful  iterates  remain 
the  same  and  make  no  progress  in  the  basic  TR  algorithm.  If  there  are  some 
repeated  iterates  in  the  subseqence  after  the  substitution,  cancel  the  repeated 
ones.  We  still  use  the  same  notation  {x^}  to  represent  the  substituted  and 
condenced  subseqence  which  only  consists  of  different  successful  iterates  and 
approaches  x*.  Since  the  parameter  vectors  {pk, }  are  bounded  by  Assump¬ 
tion  2.5,  there  must  exist  a  thinner  subsequence  {pk}  where  k  G  K'  =  {&q} 
such  that  pk  — >  p*  for  k  G  K' .  It  is  worth  pointing  out  that  p*  is  not  neces¬ 
sarily  the  parameter  the  modeling  technique  would  associate  with  x*.  Thus 
Xk  — >  x*  and  pk  —*  p*)  for  k  G  K'.  According  to  Lemma  4.2,  there  exists  a 
constant  0  >  0  such  that 

5k  >  0  >  0 

for  k  G  K'  large  enough. 

Since  r(xk,Pk){sk)  >  Cq  where  Sk  satisfies  conditions  (2)  and  (3)  for 
SUB(xk,Pk',5k)  and  k  G  K it  follows  that 

/(**)  -  /(*fc+i)  >  co[f{xk)  -  m(xk,Pk){sk)\ •  (14) 

In  order  to  derive  a  lower  bound  for  the  right-hand  side  of  (14),  consider  the 
subproblem  SUB(x*,p*;  0),  and  call  an  exact  solution  s*.  Since  x*  is  not  a 
stationary  point  of  /,  by  Lemma  2.12,  s*  ^  0  and 

/(x*)  -  m(x* ,p*)(s*)  =  tj*  >  0. 

The  regularity  on  /  implies  that  /  is  continuous  for  every  x  G  L0.  From 
Assumption  2.4,  m(x,p)(s)  is  continuous  in  (x,p)  for  every  s  G  Rn.  For 
k  G  K'  large  enough,  therefore,  we  have 

77* 

f(xk)  -  m{xk,pk)(s*)  > 
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where 


11*1  <P<6k, 

which  shows  that  s *  is  also  feasible  for  SU B{xk,Pk\  t>k)-  By  (3),  sk  obtained 
from  the  subproblem  SUB(xk,Pk',bk)  satisfies 

f(xk)  -  m(xk,pk)(sk)  >  T[f(xk)  -  m(xk,Pk)(sl)] 

>  T[f{xk)  -  m(xk,pk)(s*)] 

^  V* 

>  TT’ 

where  is  the  exact  solution  of  SU B(xk,  pk',  $k)  and  k  €  K' .  By  (14),  it 
follows  that 

/(**)  -  /(**+i)  >  co  t  y  (15) 

for  k  large  enough  and  k  €  K'. 


However,  since  the  series  with  positive  terms 


£[/(**>)  ”  /(x*«i+i)]  - 

3= 1 


oo 


1 2[f(xki})  -  f(xkij+1)] 


f(xkn)~  /(**)  <  +  °°, 


is  convergent,  we  have 

)  -  f(xk,}  +i)  — ►  0,  as  j  — ►  oo. 

This  contradicts  (15)  and  completes  the  proof.  □ 

We  can  also  state  our  convergence  theorem  in  the  following  manner. 


Corollary  4.4  Under  Assumptions  2.1  through  2.5,  if 


1.  the  level  set  Lq  =  {x  €  Rn  |  f(x)  <  /(a:o)}  is  bounded,  where 

x0  is  the  starting  point  of  the  TR  iteration, 

or 

2.  the  sequence  {a;*,}  generated  by  the  basic  TR  algorithm  is  bounded, 

then  the  sequence  has  at  least  one  accumulation  point,  and  every  accu 
mulation  point  of  {a^}  is  a  stationary  point  of  f . 
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5  Conclusions 


In  this  paper  we  have  identified  five  reasonable  assumptions,  i.e.,  Assump¬ 
tions  2.1  through  2.5,  which  have  allowed  us  to  produce  a  general  global 
convergence  theory  for  trust  region  methods  for  nonsmooth  optimization. 
We  have  demonstrated  that  this  theory  can  be  viewed  as  a  unified  approach 
to  convergence  analysis  by  showing  that  a  global  convergence  theory  for  each 
of  four  very  distinct  TR  applications  in  the  literature  can  be  obtained  as 
special  cases  of  our  general  approach.  In  two  of  these  applications  we  were 
forced  to  make  stronger  assumptions,  but  produced  a  stronger  convergence 
theory. 

In  the  cases  studied  in  Section  3,  the  parameters  in  the  TR  local  models 
could  represent  information  related  to  first  derivatives,  second  derivatives,  or 
Lagrange  multipliers.  As  a  unified  approach,  we  assumed  the  boundedness 
of  these  parameters  in  Assumption  2.5.  This  boundedness  can  be  derived 
in  many  TR  method  applications  from  parameter  updating  strategies.  The 
boundedness  of  the  parameters  is  employed  in  the  proof  of  Theorem  4.3  to 
guarantee  the  existence  of  a  convergent  subseqence.  In  particular  applica¬ 
tions,  for  example  Powell  (1984)  and  Yuan  (1983),  it  may  be  possible  to 
establish  a  convergence  theory  without  assuming  bounded  parametric  infor¬ 
mation.  This  boundedness  assumption  seems  to  be  the  price  we  had  to  pay 
for  establishing  a  general  theory. 
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